Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition
نویسندگان
چکیده
As a hot topic of speech signal processing, speech emotion recognition methods have been developed rapidly in recent years. Some satisfactory results have been achieved. However, it should be noted that most of these methods are trained and evaluated on the same corpus. In reality, the training data and testing data are often collected from different corpora, and the feature distributions of different datasets often follow different distributions. These discrepancies will greatly affect the recognition performance. To tackle this problem, a novel corpus-invariant discriminant feature representation algorithm, called transfer discriminant analysis (TDA), is presented for speech emotion recognition. The basic idea of TDA is to integrate the kernel LDA algorithm and the similarity measurement of distributions into one objective function. Experimental results under the cross-corpus conditions show that our proposed method can significantly improve the recognition rates. key words: speech emotion recognition, transfer learning, dimensionality reduction
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملDimensionality reduction for speech emotion features by multiscale kernels
To achieve efficient and compact low-dimensional features for speech emotion recognition, this paper proposes a novel feature reduction method using multiscale kernels in the framework of graph embedding. With Fisher discriminant embedding graph, multiscale Gaussian kernels are used in constructing optimal linear combination of Gram matrices for multiple kernel learning. To evaluate the propose...
متن کاملDimension Reduction and Discriminant Analysis for Japanese Connected Vowel Recognition
The aim of speech recognition is to extract only the linguistic information from speech signals. The acoustic variations caused by non-linguistic factors, such as speaker, communication channel and noise, pose a challenging problem for speech recognition. The same text can lead to different acoustic observations due to different speakers and different environments. To deal with these variations...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملTowards Speech Emotion Recognition "in the Wild" Using Aggregated Corpora and Deep Multi-Task Learning
One of the challenges in Speech Emotion Recognition (SER) “in the wild” is the large mismatch between training and test data (e.g. speakers and tasks). In order to improve the generalisation capabilities of the emotion models, we propose to use Multi-Task Learning (MTL) and use gender and naturalness as auxiliary tasks in deep neural networks. This method was evaluated in within-corpus and vari...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEICE Transactions
دوره 100-D شماره
صفحات -
تاریخ انتشار 2017